Age : Age of the patient
Sex : Sex of the patient
exang: exercise induced angina (1 = yes; 0 = no)
ca: number of major vessels visible in fluoroscopy (0-3)
cp : Chest Pain type chest pain type Value 1: typical angina Value 2: atypical angina Value 3: non-anginal pain Value 4: asymptomatic
trtbps : resting blood pressure (in mm Hg)
chol : cholestoral in mg/dl fetched via BMI sensor
fbs : (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
rest_ecg : resting electrocardiographic results Value 0: normal Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV) Value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria
thalach : maximum heart rate achieved
thal : Thalium Stress Test result
slope : the slope of the peak exercise ST segment (2 = upsloping; 1 = flat; 0 = downsloping)
oldpeak : ST depression induced by exercise relative to rest
Find any two observations in the dataset, such that they have different variables of the highest importance, e.g. age and gender have the highest (absolute) attribution for observation A, but race and class are more important for observation B.
Waterfall plots of the SHAP values of observations:


The observations have different variables of the highest importance. In case of the first observation it is oldpeak, which is pretty high, and in the second case it is the nonvisibilty of the major vessels on the fluoroscopy.
(If possible) Select one variable X and find two observations in the dataset such that for one observation, X has a positive attribution, and for the other observation, X has a negative attribution.
On the same observations as before we can see, that in the first one oldpeak is pretty high and it has negative attribution to the prediction, but in the second observation the oldpeak is low and its attribuion is positive
(How) Do the results differ across the two packages selected in point (3)?
The results from dalex are presented on the following plots:

We can see that the results are pretty similar. Since there is some randomness In both cases most important variable is the same and the feature attribution have the same signs in both packages.
(Using one explanation package of choice) Train another model of any class: neural network, linear model, decision tree etc. and find an observation for which SHAP attributions are different between this model and the one trained in point (1).


The first plot was created for the random forest classifier and the second one for the logistic regression.\ We can clearly see, that logistic regressin is looking much more at the ca parameter, then the forest.
Comment on the results obtained in points (4)-(7)
I think that the results are pretty much what was to be expected.
When it comes to the task 4. if we 2 observation, one of which has some feature with some extreme value of that feature, while the other has value close the the average one and has some other features with extreme values, then it is very logicall that the model looks at the features that have extream values, because they contain a lot of informations about our observation.
Regarding taks 5 if we take 2 observations that have very different values of some feature, then we probably should expect that one attribution will be positive and one will be negative.
When it comes to just numerical results I think that they are very similiar, but I have to say that I find shap plots to be more visually appealing.
It make sense, that different models have different ability to look at the features. Some models focuses mostly on just some subset of features (like linear models with L1 penalty), while other models always look at all of them.
import pandas as pd
import sklearn
from sklearn import ensemble
import dalex as dx
import shap
dataset = pd.read_csv('heart.csv')
dataset = pd.get_dummies(dataset)
dataset
features = dataset.drop(columns='output')
#fixing typo in data
features['thalach']=features['thalachh']
features = features.drop(columns='thalachh')
features['slope']=features['slp']
features = features.drop(columns='slp')
features['ca']=features['caa']
features = features.drop(columns='caa')
features = pd.get_dummies(features, columns=['cp', 'thall'])
features
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(features, dataset['output'], test_size=0.3, random_state=0)
X_train
forest = sklearn.ensemble.RandomForestClassifier()
forest.fit(X=X_train,y=y_train)
print(f'Accuracy: {sklearn.metrics.accuracy_score(y_test,forest.predict(X_test))}')
print(f'Recall: {sklearn.metrics.recall_score(y_test,forest.predict(X_test))}')
print(f'Precision: {sklearn.metrics.precision_score(y_test,forest.predict(X_test))}')
forest_accuracy = sklearn.metrics.accuracy_score(y_test,forest.predict(X_test))
forest_recall = sklearn.metrics.recall_score(y_test,forest.predict(X_test))
forest_precision = sklearn.metrics.precision_score(y_test,forest.predict(X_test))
print('\nResults on train dataset:')
print(f'Accuracy: {sklearn.metrics.accuracy_score(y_train,forest.predict(X_train))}')
print(f'Recall: {sklearn.metrics.recall_score(y_train,forest.predict(X_train))}')
print(f'Precision: {sklearn.metrics.precision_score(y_train,forest.predict(X_train))}')
len(X_test)
obs = [0, 1]
obs1 = X_test.iloc[obs[0]].to_numpy().reshape(1,-1)
print(obs1)
forest.predict(obs1)
obs2 = X_test.iloc[obs[1]].to_numpy().reshape(1,-1)
print(obs2)
forest.predict(obs2)
model = forest
X = X_test
y = y_test
predict = lambda m, d: m.predict(d)
explainer = dx.Explainer(forest, X_test, y_test, predict_function=predict, label="GBM")
explainer.model_performance()
shap_attributions = [explainer.predict_parts(X.iloc[[i]], type="shap", label=f'Person {i}') for i in obs]
shap_attributions
shap_attributions[0].plot(shap_attributions[1::])
bd_attributions = [explainer.predict_parts(X.iloc[[i]], type="break_down", label=f'Person {i}') for i in obs]
bd_attributions[0].plot(bd_attributions[1::])
X
shap_explainer = shap.explainers.Tree(forest, data=X, model_output="probability")
shap_values = shap_explainer(X)[:,:,1]
shap_values
for i in range(10):
shap.plots.waterfall(shap_values[i])
shap_values_forest = shap_values[4]
shap.plots.beeswarm(shap_values, max_display=10, plot_size=(9, 6))
import matplotlib.pyplot as plt
# plots.bar() has no plot_size parameter
shap.plots.bar(shap_values, max_display=10, show=False)
plt.gcf().set_size_inches(9, 6)
plt.show()
import sklearn.linear_model
model = sklearn.linear_model.LogisticRegression(max_iter=500)
model.fit(X=X_train,y=y_train)
print(f'Accuracy: {sklearn.metrics.accuracy_score(y_test,model.predict(X_test))}')
print(f'Recall: {sklearn.metrics.recall_score(y_test,model.predict(X_test))}')
print(f'Precision: {sklearn.metrics.precision_score(y_test,model.predict(X_test))}')
model_accuracy = sklearn.metrics.accuracy_score(y_test,model.predict(X_test))
model_recall = sklearn.metrics.recall_score(y_test,model.predict(X_test))
model_precision = sklearn.metrics.precision_score(y_test,model.predict(X_test))
print('\nResults on train dataset:')
print(f'Accuracy: {sklearn.metrics.accuracy_score(y_train,model.predict(X_train))}')
print(f'Recall: {sklearn.metrics.recall_score(y_train,model.predict(X_train))}')
print(f'Precision: {sklearn.metrics.precision_score(y_train,model.predict(X_train))}')
import warnings
shap_explainer = shap.Explainer(lambda x: model.predict_proba(x)[:, 1], X)
shap_values = shap_explainer(X)
shap_values
for i in range(10):
shap.plots.waterfall(shap_values[i])
shap_values_lr = shap_values[4]
shap.plots.waterfall(shap_values_forest)
shap.plots.waterfall(shap_values_lr)